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Abstract. We consider two-player zero-sum games on graphs. These games can 
be classified on the basis of the information of the players and on the mode of 
interaction between them. On the basis of information the classification is as fol- 
lows: (a) partial-observation (both players have partial view of the game); (b) 
one-sided complete-observation (one player has complete observation); and (c) 
complete-observation (both players have complete view of the game). On the 
basis of mode of interaction we have the following classification: (a) concurrent 
(both players interact simultaneously); and (b) turn-based (both players interact in 
turn). The two sources of randomness in these games are randomness in transition 
function and randomness in strategies. In general, randomized strategies are more 
powerful than deterministic strategies, and randomness in transitions gives more 
general classes of games. In this work we present a complete characterization for 
the classes of games where randomness is not helpful in: (a) the transition func- 
tion (probabilistic transition can be simulated by deterministic transition); and 
(b) strategies (pure strategies are as powerful as randomized strategies). As con- 
sequence of our characterization we obtain new undecidability results for these 
games. 



1 Introduction 

Games on graphs. Games played on graphs provide the mathematical framework to 
analyze several important problems in computer science as well as mathematics. In par- 
ticular, when the vertices and edges of a graph represent the states and transitions of a 
reactive system, then the synthesis problem (Church's problem) asks for the construc- 
tion of a winning strategy in a game played on a graph f5 16 15 131. Game-theoretic 
formulations have also proved useful for the verification [i I, refinement 1 10|, and com- 
patibility checking Q of reactive systems. Games played on graphs are dynamic games 
that proceed for an infinite number of rounds. In each round, the players choose moves; 
the moves, together with the current state, determine the successor state. An outcome 
of the game, called a play, consists of the infinite sequence of states that are visited. 

Strategies and objectives. A strategy for a player is a recipe that describes how the 
player chooses a move to extend a play. Strategies can be classified as follows: pure 
strategies, which always deterministically choose a move to extend the play, vs. ran- 
domized strategies, which may choose at a state a probability distribution over the avail- 
able moves. Objectives are generally Borel measurable functions 1 12|: the objective for 
a player is a Borel set B in the Cantor topology on S'" (where S is the set of states), and 



the player satisfies the objective iff the outcome of the game is a member of B. In verifi- 
cation, objectives are usually oj-regular languages. The w-regular languages generaUze 
the classical regular languages to infinite strings; they occur in the low levels of the 
Borel hierarchy (they lie in S3 n 7T3) and they form a robust and expressive language 
for determining payoffs for conomonly used specifications. 

Classification of games. Games played on graphs can be classified according to the 
knowledge of the players about the state of the game, and the way of choosing moves. 
Accordingly, there are (a) partial-observation games, where each player only has a 
partial or incomplete view about the state and the moves of the other player; (b) one- 
sided complete-observation games, where one player has partial knowledge and the 
other player has complete knowledge about the state and moves of the other player; 
and (c) complete-observation games, where each player has complete knowledge of the 
game. According to the way of choosing moves, the games on graphs can be classi- 
fied into turn-based and concurrent games. In turn-based games, in any given round 
only one player can choose among multiple moves; effectively, the set of states can be 
partitioned into the states where it is player 1 's turn to play, and the states where it is 
player 2's turn. In concurrent games, both players may have multiple moves available 
at each state, and the players choose their moves simultaneously and independently. 
Sources of randomness. There are two sources of randomness in these games. First is 
the randomness in the transition function: given a current state and moves of the players, 
the transition function defines a probability distribution over the successor states. The 
second source of randomness is the randomness in strategies (when the players play 
randomized strategies). In this work we study when randonmess can be obtained for 
free; i.e., we study in which classes of games the probabilistic transition function can 
be simulated by deterministic transition function, and the classes of games where pure 
strategies are as powerful as randomized strategies. 

Motivation. The motivation to study this problem is as follows: (a) if for a class of 
games it can be shown that randomness is free for transitions, then all future works 
related to analysis of computational complexity, strategy complexity, and algorithmic 
solutions can focus on the simpler class with deterministic transitions (the randomness 
in transition may be essential for modeling appropriate stochastic reactive systems, but 
the analysis can focus on the deterministic subclass); (b) if for a class of games it can be 
shown that randomness is free for strategies, then all future works related to correctness 
results can focus on the simpler class of deterministic strategies, and the results would 
follow for the more general class of randomized strategies; and (c) the characterization 
of randomness for free will allow hardness results obtained for the more general class 
of games (such as games with randomness in transitions) to be carried over to simpler 
class of games (such as games with deterministic transitions). 
Our contribution. Our contributions are as follows: 

1 . Randomness for free in transitions. We show that randomness in the transition func- 
tion can be obtained for free for complete-observation concurrent games (and any 
class that subsumes complete-observation concurrent games) and for one-sided 
complete-observation turn-based games (and any class that subsumes this class). 
The reduction is polynomial for complete-observation concurrent games, and ex- 
ponential for one-sided complete-observation turn-based games. It is known that for 



complete-observation turn-based games, a probabilistic transition function cannot 
be simulated by deterministic transition function (see discussion at end of Section 3 
for details), and thus we present a complete characterization when randomness can 
be obtained for free for the transition function. 

2. Randomness for free in strategies. We show that randomness in strategies is free 
for complete-observation turn-based games, and for one-player partial-observation 
games (POMDPs). For all other classes of games randomized strategies are more 
powerful than pure strategies. It follows from a result of Martin ||T21 that for 
one-player complete-observation games with probabilistic transitions (MDPs) pure 
strategies are as powerful as randomized strategies. We present a generalization of 
this result to the case of one-player partial-observation games with probabilistic 
transitions (POMDPs). Our proof is totally different from Martin's proof and based 
on a new derandomization technique of randomized strategies. 

3. New undecidability results. As a consequence of our characterization of random- 
ness for free, we obtain new undecidability results. In particular, using our results 
and results of Baier et al. [2J we show for one-sided complete-observation deter- 
ministic games, the problem of almost-sure winning for coBiichi objectives and 
positive winning for Biichi objectives are undecidable. Thus we obtain the first 
undecidability result for qualitative analysis (almost-sure and positive winning) of 
one-sided complete-observation deterministic games with w-regular objectives. 

2 Definitions 

In this section we present the definition of concurrent games of partial information and 
their subclasses, and related notions of strategies and objectives. Our model of game is 
essentially the same as in Q and is equivalent to the model of stochastic games with 
signals fl 4l3L A probability distribution on a finite set yl is a function k : A ^ [0, 1] 
such that X^aeA '*('^) ~ ^- denote by 'D{A) the set of probability distributions on 
A. 

Concurrent games of partial observation. A concurrent game of partial observation 
(or simply a game) is a tuple G = {S, Ai,A2, S, Oi, O2) with the following compo- 
nents; 

1. (State space). S* is a finite set of states; 

2. (Actions). Ai (i — 1, 2) is a finite set of actions for Player i; 

3. (Probabilistic transition function). 6 : S x Ai x A2 —i' T>(S) is a concurrent 
probabilistic transition function that given a current state s, actions ai and 02 for 
both players gives the transition probability 5{s, ai, a2)(s') to the next state s'; 

4. (Observations). Oi C 2"^ (i = 1, 2) is a finite set of observations for Player i that 
partition the state space S. These partitions uniquely define functions obs; : S — t- 
Oi a = 1,2) that map each state to its observation such that s G obsi(s) for all 
s e S. 

Special cases. We consider the following special cases of partial observation concurrent 
games, obtained either by restrictions in the observations, the mode of selection of 
moves, the type of transition function, or the number of players; 



- (Observation restriction). The games with one-sided complete-observation are the 
special case of games where Oi = {{s} \ s E S} (i.e., Player 1 has com- 
plete observation) or O2 = {{s} | s e 5} (Player 2 has complete observa- 
tion). The games of complete-observation are the special case of games where 
01=02 = {{s} I s e S}, i.e., every state is visible to each player and hence both 
players have complete observation. If a player has complete observation we omit 
the corresponding observation sets from the description of the game. 

- (Mode of interaction restriction). A turn-based state is a state s such that either (i) 
5{s, a, b) = S{s, a, b') for all a E Ai and all b, b' G (i.e, the action of Player 1 
determines the transition function and hence it can be interpreted as Player I's turn 
to play), we refer to s as a Player-1 state, and we use the notation (5(s,a, — ); or 
(m) (5(s, a, fe) = (5(.s, o/, h) for all a, o! G Ai and all b A2. We refer to s as a 
Player-2 state, and we use the notation 6{s, —,b). A state ,s which is both a Player-1 
state and a Player-2 state is called a probabilistic state (i.e., the transition function 
is independent of the actions of the players). We write the S(s, — , — ) to denote the 
transition function in s. The turn-based games are the special case of games where 
all states are turn-based. 

- (Transition function restriction). The deterministic games are the special case of 
games where for all states s G 5 and actions a G Ai and b G A2, there exists a state 
s' € S such that S{s, a, b) {s') = 1. We refer to such states s as deterministic states. 
For deterministic games, it is often convenient to assume that S : SxAixA2 ^ S. 

- (Player restriction). The IVz-player games, also called partially observable 
Markov decision processes (or POMDP), are the special case of games where Ai 
or A2 is a singleton. Note that 1 V2-player games are turn-based. Games without 
player restriction are sometimes called 2 V2-player games. 

The 1 V2-player games of complete-observation are Markov decision processes (or 
MDP), and 1 V2-player deterministic games can be viewed as graphs (and are often 
called one-player games). 

Classes of game graphs. We will use the following abbreviations: we will use Pa 
for partial observation, Os for one-sided complete-observation, Co for complete- 
observation, C for concurrent, and T for turn-based. For example, CoC will denote 
complete-observation concurrent games, and OsT will denote one-sided complete- 
observation turn-based games. For C G {Pa, Os, Co} x {C, T}, we denote by Qc the 
set of all C games. Note that the following strict inclusion: partial observation (Pa) is 
more general than one-sided complete-observation (Os) and Os is more general than 
complete-observation (Co), and concurrent (C) is more general than turn-based (T). We 
will denote by Qd the set of all games with deterministic transition function. 

Plays. In a game structure, in each turn, Player 1 chooses an action a € Ai, Player 2 
chooses an action in 6 G A2, and the successor of the current state s is chosen according 
to the probabilistic transition function S{s,a,b). A play in G is an infinite sequence of 
states p = sqSi . . . such that for all i > 0, there exists G Ai and bi G A2 with 
S{si, Gi, bi, .Si+i) > 0. The prefix up to s„ of the play p is denoted by p{n), its length 
is \p{n)\ = n + 1 and its last element is Last{p{n)) = s„. The set of plays in G 
is denoted Plays(G), and the set of corresponding finite prefixes is denoted Prefs((j). 



The observation sequence of p for player i {i = 1 , 2) is the unique infinite sequence 
obsi(p) = oqOi . . . e such that Sj G Oj for all j > 0. 

Strategies. A pure strategy in G for Player 1 is a function a : Prefs(G') Ai. A 
randomized strategy in G for Player 1 is a function a : Prefs(G) 2?(Ai). A (pure 
or randomized) strategy a for Player 1 is observation-based if for all prefixes p, p' G 
Prefs(G), if obsi(/9) = obsi{p'), then a{p) — cr{p'). We omit analogous definitions 
of strategies for Player 2. We denote by Sq, ^G' ^g, and 77^ the set of 
all Player- 1 strategies, the set of all observation-based Player- 1 strategies, the set of all 
pure Player-1 strategies, the set of all Player-2 strategies in G, the set of all observation- 
based Player-2 strategies, and the set of all pure Player-2 strategies, respectively. Note 
that if Player 1 has complete observation, then Uq = Sq. 

Objectives. An objective for Player 1 in G is a set C S*" of infinite sequences of states. 
A play p e Plays(G) satisfies the objective (p, denoted p \= (p, if p € cf). Objectives are 
generally Borel measurable: a Borel objective is a Borel set in the Cantor topology on 
S'^ ifTTl . We specifically consider w-regular objectives specified as parity objectives 
(a canonical form to express all oj-regular objectives |17|). For a play p ~ sqSi . . . 
we denote by Inf(p) the set of states that occur infinitely often in p, that is, Inf (p) = 
{s I Sj = s for infinitely many j's}. For d G N, let p : S* — > {0,1,..., d} be a 
priority function, which maps each state to a non-negative integer priority. The parity 
objective Parity(p) requires that the minimum priority that occurs infinitely often be 
even. Formally, Parity(|5) = {p \ mm{p{s) \ s G Inf (p)} is even}. The Biichi and 
coBiichi objectives are the special cases of parity objectives with two priorities, p : S ^ 
{0, 1} and p : S ^ {1,2} respectively. We say that an objective (p is visible for Player i 
if for all p, p' G 5", if p \= 4> and obsi{p) = obsi(p'), then p' ^ (p. For example if the 
priority function maps observations to priorities (i.e., p : Oi ^ {0, 1, . . . , d}), then the 
parity objective is visible for Player i. 

Almost-sure winning, positive winning and value function. An event is a measurable set 
of plays, and given strategies a and tt for the two players, the probabilities of events are 
uniquely defined 118|. For a Borel objective ip, we denote by Pi-'^'^{4>) the probability 
that (j) is satisfied by the play obtained from the starting state s when the strategies a and 
TT are used. Given a game structure G and a state s, an observation-based strategy cr for 
Player 1 is almost-sure winning (almost winning in short) (resp. positive winning) for 
the objective (p from s if for all observation-based randomized strategies tt for Player 2, 
we have Pr^'''(</>) = 1 (resp. Pr^'''(0) > 0). The value function ((1))^^; : 5 K 
for Player 1 and objective (p assigns to every state the maximal probability with which 
Player 1 can guarantee the satisfaction of (p with an observation-based strategy, against 
all observation-based strategies for Player 2. Formally we have 

((l))?a;(0)(s) = sup inf Prr(0). 

For e > 0, an observation-based strategy is e-optimal for cp from s if we have 
inf^gyjo Pt'^'^{(P) > — £■ An optimal strategy is a 0-optimal strategy. 

Example 1. Consider the game with one-sided complete observation (Player 2 has com- 
plete information) shown in Fig. [T] Consider the Biichi objective defined by the state 
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Fig. 1. A game with one-sided complete observation. 
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Fig. 2. The various classes of game graphs. The curves materiaUze the classes for which 
randomness is for free in transition relation (Theorem|2]and Theorem[3]l. For 2 Y2-player 
games, randomness is not free only in complete-observation turn-based games. 

S4 (i.e., state S4 has priority and other states have priority 1). Because Player 1 has 
partial observation (given by the partition Oi — {{si}, {s2, S2}, {sa, S3}, {34}}), she 
cannot distinguish between S2 and s'2 and therefore has to play the same actions with 
same probabilities in S2 and s'a (while it would be easy to win by playing 02 in S2 and ai 
in S2, this is not possible). In fact. Player 1 cannot win using a pure observation-based 
strategy. However, playing oi and 02 uniformly at random in all states is almost-sure 
winning. Every time the game visits observation 02, for any strategy of Player 2, the 
game visits S3 and S3 with probability ^, and hence also reaches S4 with probability 
i. It follows that against all Player 2 strategies the play eventually reaches S4 with 
probability 1, and then stays there. 

3 Randomness for Free in Transition Function 

In this section we present a precise characterization of the classes of games where the 
randomness in transition function can be obtained for free: in other words, we present 
the precise characterization of classes of games with probabilistic transition function 
that can be reduced to the corresponding class with deterministic transition function. 
We present our results as three reductions: (a) the first reduction allows us to separate 
probability from the mode of interaction; (b) the second reduction shows how to simu- 



late probability in transition function with CoC (complete-observation concurrent) de- 
terministic transition; and (c) the final reduction shows how to simulate probability in 
transition with OsT(one-sided complete-observation turn-based) deterministic transi- 
tion. All our reductions are local: they consist of a gadget construction and replacement 
locally at every state. Our reductions preserve values, existence of e-optimal strategies 
for e > 0, and also existence of almost-sure and positive winning strategies. A visual 
overview is given in Fig.|2] 

3.1 Separation of probability and interaction 

A concurrent probabilistic game of partial observation G satisfies the interaction sep- 
aration condition if the following restrictions are satisfied (see also Fig. [Hi: the state 
space S can be partitioned into {Sa, Sp) such that (1) 5 : Sa 'x Ai x A2 ^ Sp, and 
(2) d : Sp X Ai X A2 ^ 'D{Sa) such that for all s e Sp and all s' e Sa, and for 
all ai,a2,a[,a'2 we have 6{s, oi, a2)(s') = S{s, a'i,a'2){s') — 6{s, —){s'). In other 
words, the choice of actions (or the interaction) of the players takes place at states in Sa 
and actions determine a unique successor state in Sp, and the transition function at Sp 
is probabilistic and independent of the choice of the players. In this section, we reduce 
a class of games to the corresponding class satisfying interaction separation. 

Reduction to interaction separation. Let G = {S, Ai,A2,S, d, 02) be a concuiTent 
game of partial observation with an objective (p. We obtain a concurrent game of partial 
observation G = {Sa U Sp,Ai,A2,S,Oi,02) where Sa ^ S, Sp = S x Ai x A2, 
and: 

- Observation. For i € {1,2}, if 0^ = {{s} \ s € S}, then = {{s'} \ s' e 
5a U 5p}; otherwise Oi contains the observation oU {(s, ai, 02) | s G 0} for each 

- Transition function. The transition function is as follows: 

1. We have the following three cases: (a) if s is a Player 1 turn-based state, then 
pick an action and for all 02 let (S(s, 01,02) = (3,01,02); (b) if s is a 
Player 2 turn-based state, then pick an action o^; and for all oi let (5(s, oi, 02) = 
(s, o*, 02); and (c) otherwise, S{s, oi, 02) — (s, oi, 02); 

2. for all (s, 01,02) G Sp we have S{{s, 01,02), — , — )(s') = S{s, oi, a2)(s'). 
Thus the states in S are Sa where the interaction takes places, and the states in 
S X Ai X A2 aie the purely probabilistic states Sp. 

- Objective mapping. Given the objective in G we obtain the objective (j> = 
{{s()SqSis[ ...) \ (sqSi . . .) G 0} in G. 

It is easy to map observation-based strategies of the game G to observation-based strate- 
gies in G and vice-versa that preserves satisfaction of cf) and in G and G, respectively. 
Let us refer to the above reduction as Reduction: i.e., Reduction(G, (p) = (G, (p). Then 
we have the following theorem. 

Theorem 1. Let G be a concurrent game of partial observation with an objective (p, 
and let (G, (p) = Reduction(G, 4>). Then the following assertions hold: 



1. The reduction Reduction is restriction preserving; // G is one-sided complete- 
observation, then so is G; if G is complete- observation, then so is G; if G is turn- 
based, then so is G. 

2. For all s ^ S, there is an observation-based almost-sure (resp. positive) winning 
strategy for (pfrom s in G iff there is an observation-based almost-sure (resp. posi- 
tive) winning strategy for <f>from s in G. 

3. The reduction is objective preserving: if (p is a parity objective, then so in (j>; if (j) is 
an objective in the k-the level of the Borel hierarchy, then so is (j). 

4. For all s e S we have ((l))^a;(0)(s) = {m'^ai(^)i^)- all s e S there is an 
observation-based optimal strategy for (p from s in G iff there is an observation- 
based optimal strategy for <f>from s in G. 

Since the reduction is restriction preserving, we have a reduction that separates the 
interaction and probabilistic transition maintaining the restriction of observation and 
mode of interaction. 

Uniform-ri-ary concurrent probabilistic games. The class of uniform-n-ary proba- 
bilistic games are the special class of probabilistic games such that every state s € Sp 
has n successors and the transition probability to each successor is -.It follows from 
the results of 1 19 1 that every CoC probabilistic game with rational transition probabili- 
ties can be reduced in polynomial time to an equivalent polynomial size uniform-binary 
(i.e., n = 2) CoC probabilistic game for all parity objectives. The reduction is achieved 
by adding dummy states to simulate the probability, and the reduction extends to all 
objectives (in the reduced game we need to consider the objective whose projection in 
the original game gives the original objective). 

In the case of partial information, the reduction to uniform-binary probabilistic 
games of |fT9l is not valid. To see this, consider Fig. |3] where two probabilistic states 
si, S2 have the same observation (i.e., obsi(si) = obsi(s2)) and the outgoing proba- 
bilities are (i, |) from si and (i, |) from S2- The corresponding uniform-binary game 
(given in Fig. |3]l is not equivalent to the original game because the number of steps 
needed to simulate the probabilities is not always the same from si and from S2- From 
si two steps are always sufficient, while from S2 more than two steps may be necessary 
(with probability j). Therefore with probability i. Player 1 observing more than 2 steps 
would infer that the game was for sure in S2, thus artificially improving his knowledge 
and increasing his value function. 

Therefore in the case of partial observation, we can only reduce a probabilistic game 
G to a uniform-n-ary probabilistic game with n — l/r where r is the greatest common 
divisor of all probabilities in the original game G (a rational r is a divisor of a rational p 
if p = q ■ r for some integer q). Note that the number n = l/r is an integer We denote 
by [n] the set {0, 1, . . . , n — 1}. For a probabilistic state s € Sp, we define the n-tuple 
Succ(s) = (sq, . . . , sj^„i) in which each state s' G S occurs n ■ d{s, — , — )(s') times. 
Then, we can view the transition relation S{s, — , — ) as a function assigning the same 
probability r — 1/n to each element of Succ(s) (and then adding up the probabilities 
of identical elements). 

Note that the above reduction is worst-case exponential (because so can be the least 
common multiple of all probability denominators). This is necessary to have the prop- 
erty that all probabilistic states in the game have the same number of successors. We 



Fig. 3. An example showing why the uniform-binary reduction cannot be used with 
partial observation. 




Fig. 4. Example of interaction separation for S{s,ai,bi){si) = ^ and 
5{s,ai,bi){s2) = |. 

will see that this property is crucial because it determines the number of actions avail- 
able to Player 1 in the reductions presented in Section [J!2l and [331 and the number of 
available actions should not differ in states that have the same observation. 

3.2 Simulation of probability with complete-observation concurrent 
determinism 

In this section, we show that probabilistic states can be simulated by CoC deterministic 
gadgets (and hence also by OsC and PaC deterministic gadgets). By Theorem [T] we 
focus on games that satisfy interaction separation. 

Theorem 2. Let a e {Pa, Os, Co} and b S {C, T}, and let C = ah and C = aC. Let 
G be a game in Qc with probabilistic transition function with rational probabilities and 
an objective (j>. A game G € Qc C\Qd (in the class that subsumes Qc with concurrent 



interaction) with deterministic transition function can be constructed in (a) polynomial 
time if a ^ Co, and (b) in exponential time if a — Pa or Os, with an objective cj) such 
that the state space ofG is a subset of the state space of G and the following assertions 
hold. 

1. For all s £ S there is an observation-based almost-sure (resp. positive) winning 
strategy from sfor (j) in G iff there is an observation-based almost-sure (resp. posi- 
tive) winning strategy for (jjfrom s in G. 

2. For all s € S we have ((1))^q;((/>)(s) = {{^))vaiWi^)- all s £ S there is an 
observation-based optimal strategy for (p from s in G iff there is an observation- 
based optimal strategy for <j)from s in G. 

Proof. To prove the desired result we show how an uniform-n-ary probabilistic state 
can be simulated by a CoC deterministic gadget. For simplicity we present the details 
for the case when n — 2, and the gadget for the general case is given in the Appendix. 
Our reduction will be as follows: we consider a uniform-binary CoC probabilistic game 
such that there is only one probabilistic state, and reduce it to a CoC deterministic 
game. For uniform-binary CoC probabilistic games with multiple probabilistic states 
the reduction can be applied to each state one at a time and we would obtain the desired 
reduction from uniform-binary CoC probabilistic games to CoC deterministic games. 
Hence we prove the following claim. 

Claim. Consider a uniform-binary CoC probabilistic game G with a single proba- 
bilistic state s* with two successors si and S2- Consider the CoC deterministic game 
G" obtained from G by transforming the state s* to a concurrent deterministic state 
as follows: the actions available for player 1 at s* are ai and 02 and the actions 
available for player 2 at s* are 61 and 62; and the transition function is as follows: 
(5(s*, oi, 61) = 5(s*, a2, 62) = si and (5(s*, ai, 62) = (5(s*, 02, 61) = S2. Then for all 
objectives 0, the following assertions hold. 

1. For all s e S* there is an observation-based almost-sure (resp. positive) winning 
strategy from s for in G iff there is an observation-based almost-sure (resp. posi- 
tive) winning strategy for (f) from s in G'. 

2. For all s e S" we have {{l))^aii^){s) = {{l))^'ai{(t>){s)- For all s e S" there is an 
observation-based optimal strategy for from s in G iff there is an observation- 
based optimal strategy for from s in G'. 

The reduction is illustrated in Figure |5] We prove the claim as follows. Let the value 
for the objective player 1 at a state s be v{s) and v'{s) in G and G', respectively, and 
let the value for player 2 be w{s) and w'{s) in G and G', respectively. By determinacy 
of CoC games {V2\ we have w{s) = 1 — v{s) and w'{s) = 1 — v'{s). We present two 
inequalities to complete the proof. 

1 . Consider a strategy vr for player 2 in G and we construct a strategy vr' for player 2 
in G as follows: the strategy tt' follows the strategy vr for all histories other than 
when the current state is s*; and if the current state is s*, then strategy tt' plays the 
actions 61 and &2 uniformly with probability i. Given the strategy tt', if the current 
state is s*, then for any probability distribution over oi and 02, the successor states 




Fig. 5. The reduction of uniform-binary CoC probabilistic games. 



are si and S2 with probability i (i.e., it plays exactly the role of state s* in G). It 
follows that the value for player 1 in G' is no more than the value in G, i.e., for all 
s we have v'{s) < v{s). 
2. Consider a strategy a for player 1 in G and we construct a strategy a' for player 1 in 
G' as follows: the strategy a' follows the strategy a for all histories other than when 
the current state is s*, and if the current state is s*, then the strategy a' plays the 
actions ai and a2 uniformly with probability i. Given the strategy <t', if the current 
state is s*, then for any probability distribution over bi and 62, the successor states 
are si and S2 with probability ^ (i.e., it plays exactly the role of state s* in G). It 
follows that the value for player 2 in G' is no more than the value in G, i.e., for all 
s we have w'{s) < w{s). 

It follows from above that v{s) = v'{s) for all states s, and the desired result follows. 
Observe that the reduction also ensures that from an optimal strategy in G we can 
construct an optimal strategy in G' and vice-versa. Our proof shows how probabilistic 
states can be simulated by CoC deterministic states, and it follows that probabilistic 
states can be simulated by OsC deterministic states and PaC deterministic states. The 
result follows. I 



3.3 Simulation of probability with one-sided complete-observation turn-based 
determinism 

We show that probabilistic states can be simulated by OsT (one-sided complete- 
observation turn-based) states, and by Theorem [T] we consider games that satisfy in- 
teraction separation. The reduction is illustrated in Fig. |6l each probabilistic state s is 
transformed into a Player-2 state with n successor Player-1 states (where n is chosen 
such that the probabilities in s are integer multiples of here n = 3). Because all 
successors of s have the same observation. Player 1 has no advantage in playing after 
Player 2, and because by playing all actions uniformly at random each player can uni- 
laterally decide to simulate the probabilistic state, the value and properties of strategies 
of the game are preserved. 

Theorem 3. Let a G {Pa, Os, Co} and b G {C, T}, and let a' — a if a ^ Co, and 
a' — Os otherwise. Let C = ab and C = a'b. Let G be a game in Qc with probabilistic 
transition function with rational transition probabilities and an objective <f>. A game 
G' G Qc n Qd ( in the class that subsumes one-sided complete-observation turn-based 
games and the class Qc ) with deterministic transition function can be constructed in 
exponential time with an objective 0' such that the state space of G is a subset of the 
state space ofG' and the following assertions hold. 



1. For all s £ S there is an observation-based almost-sure (resp. positive) winning 
strategy from sfor <j) in G ijf there is an observation-based almost-sure (resp. posi- 
tive) winning strategy for (j)' from s in G' . 

2. For all s £ S we have {{^yaiWi^) = ((l))™z(</'')(s)- For all s e S there is an 
observation-based optimal strategy for (j) from s in G iff there is an observation- 
based optimal strategy for (j)' from s in G'. 

Proof. First, we present the proof for a ^ Co, assuming that Player 2 has complete 
observation. Let G — {Sa U Sp, Ai,A2,6, Oi) and assume w.l.o.g. (according to The- 
orem[T]i that G satisfies interaction separation (i.e., states in Sa are deterministic states, 
and Sp are probabilistic states) and G is uniform-n-ary, i.e. all probabilities are equal 
to For each probabilistic state s £ Sp, let Succ(s) — {sq, . . . , s'^_i) be the n-tuple 
of states such that S{s, — , — )(s^) = — for each 1 < z < ti. 

We present a reduction that replaces the probabilistic states in G by a gadget with 
Player- 1 and Player-2 turn-based states. From G, we construct the one-sided complete- 
observation game G' where Player-2 has complete observation. A similar construction 
where Player- 1 instead of Player-2 has complete observation is obtained symmetrically. 
The game G' = {S\A[,A'2,S',0[) is defined as follows: 5" ^ SU{Sx [n])U{sink}, 
A[ AiU[n],A'2 = A2U[n], 0[ = {oU{(s,i) \ s e o} \ o e d}, and 5' is obtained 
from 6 by applying the following transformation for each state s e 5: 

1. if s is a deterministic state in G, then 5'{s, a, b) = 6{s, a, b) for all a £ Ai, & G A2, 
and 6'{s, — S'{s, i, — ) = sink for all i,j G [n]; 

2. if s is a probabilistic state in G, then s is a Player-2 state in G' and for all i, j G [n] 
we define 6'{s, — , i) = (s, i) and 5'{{s, i),j, — ) = sj. such that sj^, is the elementin 
position k in Succ(s) with k = i+j mod n (and let S'{s, — , b) = 6'{{s, i), a, — ) = 
(5' (sink, — , — ) — sink for all a G Ai, 6 G ^2). 

Note that turn-based states in G remain turn-based in G' and the states (s, •) are 
Player- 1 states with the same observation as s. A sequence of observation oi, . . . , o,„ in 
G corresponds to the sequence 01,02,02,03,04,04, ... , o„i in G' because deterministic 
and probabilistic states alternate in G, and in G', transitions from probabilistic states 
have intermediate states with duplicated observation. The objective cj)' is defined as 
the set {oi, 02, 02,03, 04,04, .. . \ 01,02,... G (j)}. Intuitively, each player in G' has the 
possibility to force faithful simulation of the probabilistic states of G by playing actions 
in [n] uniformly at random. For instance, if Player 1 does so, then irrespective of the 
(possibly randomized) choice of Player 2 among the states (s, 1), . . . , (s, n), the states 
in Succ(s) are reached with probability 1/n, as in G. And the same holds if Player 2 
plays in [n] uniformly at random, no matter what Player 1 does. Therefore, Player 1 can 
achieve the objective in G' with the same probability as for (p in G, but not more. 

The above reduction can be easily adapted to the case a = Pa of games with partial 
information for both players. I 

Role of probabilistic transition in CoT games and POMDPs. We have already shown 
that for CoC games and OsT games, randomness in transition can be obtained for free. 
We complete the picture by showing that for CoT (complete-observation turn-based) 
games randomness in transition cannot be obtained for free. It follows from the result 



Fig. 6. For the probabilistic state s (on the left), we have Succ(s) — {s'q, s'l, s[) and 
n = 3 is the gcd of the probabilities denominators. Therefore, we apply the reduction 
of TheoremOto obtain the turn-based game on the right, where s is a Player-2 states. 





2 V2-player 


1 V2-player 


complete 


one-sided 


partial 


MDP 


POMDP 


turn-based 


not 


free 


free 


not 


not 


concurrent 


free 


free 


free 


(NA) 


(NA) 



Table 1. When randomness is for free in the transition function. In particular, proba- 
bilities can be eliminated in all classes of 2-player games except complete-observation 
turn-based games. 



of Martin |12 | that for all CoT deterministic games and all objectives, the values are 
either 1 or 0; however, MDPs with reachability objectives can have values in the interval 
[0, 1] (not value and 1 only). Thus the result follows for CoT games. It also follows 
that "randomness in transitions" can be replaced by "randomness in strategies" is not 
true: in CoT deterministic games even with randomized strategies the values are either 1 
or lfT2ll : whereas MDPs can have values in the interval [0, 1]. For POMDPs, we show 
in Theorem |5] that pure strategies are sufficient, and it follows that for POMDPs with 
deterministic transition function the values are or 1, and since MDPs with reachability 
objectives can have values other than and 1 it follows that randomness in transition 
cannot be obtained for free for POMDPs. The probabilistic transition also plays an 
important role in the complexity of solving games in case of CoT games: for example, 
CoT deterministic games with reachability objectives can be solved in linear time, but 
for probabilistic transition the problem lies in NP n coNP and no polynomial time 
algorithm is known. In contrast, for CoC games we present a polynomial time reduction 
from probabilistic transition to deterministic transition. Table [T] summarizes our results 
characterizing the classes of games where randomness in transition can be obtained for 
free. 



4 Randomness for Free in Strategies 



It is known from the results of IE\ that in CoC games randomized strategies are more 
powerful than pure strategies; for example, values achieved by pure strategies are lower 
than values achieved by randomized strategies and randomized almost-sure winning 
strategies may exist whereas no pure almost-sure winning strategy exists. Similar results 
also hold in the case of OsT games (see [6J for an example). By contrast we show that 
in one-player games, restricting the set of strategies to pure strategies does not decrease 
the value nor affect the existence of almost-sure and positive winning strategies. We 
first start with a lemma, then present a result that can be derived from Martin's theorem 
for Blackwell games IIT2I . and finally present our results precisely in a theorem. 

Lemma 1. Let G be a POMDP with initial state s, and an objective (j) C 5". Then for 
every randomized observation-based strategy a £ So there exists a pure observation- 
based strategy Gp 6 Up n So such that: 



Proof. Let G = {S,A,5,0) a POMDP. Let a : O* ^ V{A) be a randomized 
observation-based strategy and fix G 5 an initial state. 

To simplify notations, we suppose that A = {0, 1} contains only two actions, and 
that given a state s G 5 and an action a G {0, 1} there are only two possible successors 
L(s, a) ^ S and R{s, a) E S chosen with respective probabilities S{s, a, L{s, a)) and 
6{s, a, R{s, a)) = 1 — 6{s, a, L{s, a)). The proof is for an arbitrary finite set of actions 
and more than two successors is essentially the same, with more complicated notations. 

There is a natural way to "derandomize" the randomized strategy a. Fix an infinite 
sequence x — {xn)neN G [0, 1]" and define the deterministic strategy (Jx as follows. 
For every oq, oi , . . . , o„ G O*, 



Intuitively, the sequence x fixes in advance the sequence of results of coin tosses used 
for playing with a. 

To prove the lemma, we show that [0, 1]'^ can be equipped with a probabiUty mea- 
sure 1/ such that the mapping x i-> Pr^^ i'f') from [0, 1]" to [0, 1] is measurable and: 



Suppose that (|2]l holds. Then there exists x G [0, 1]" (actually many x's) such that 
Pr^^ ((/)) < Pr^^ (ip) and since strategy cr^ is deterministic, this proves the lemma. 

To complete the proof of Lemma [1] it is thus enough to construct a probability 
measure v on [0, 1]"^ such that holds. 

We start with the definition of the probability measure i^. The set [0, 1]" is equipped 
with the cr-field generated by sequence-cylinders which are defined as follows. For ev- 
ery finite sequence x = xq,xi, . . . ,Xn G [0, 1]* the sequence-cylinder 0{x) is the sub- 
set [0, xo] X [0, xi] X ... X [0, Xn] X [0, 1]" C [0, 1]". According to Tulcea's theorem 101, 



Pil(0) <PC(^) . 



(1) 




ifa:„ < cr(oo,oi,...,o„)(0) 

1 otherwise. 




(2) 



there is a unique product probabihty measure v on [0, 1]'^ such that v{0{e)) = 1 and 
for every sequence xq, . . . , a^n, Xn+i in [0,1], 



Now that V is defined, it remains to prove that the mapping x i-> Pr^"' (0) from 
[0, 1]" to [0, 1] is measurable and that (|2]i holds. For that, we introduce the following 
mapping: 



that associates with every pair of sequences ((x„)„gN, {yn)nefi) the infinite history 
h = so«iSia2 . . . G [SA)'^ defined recursively as follows. First sq = s*^ and for every 
n G N, 



Intuitively, {xn)n&i fixes in advance the coin tosses used by the strategy, while 
(yn)nGN takcs carc of coin tosses used by the probabilistic transitions, and fs, ,a pro- 
duces the resulting description of the play. Thanks to the mapping fs, ,a, randomness 
related to the use of the randomized strategy a is separated from randomness due to 
transitions of the game, which allows to represent the randomized strategy a by mean 
of a probability measure over the set of deterministic strategies {a^ | a; e [0, 1]"}. 

We equip both sets [SAY and [0, 1]"^ x [0, 1]"^ with cr-fields that make /^^ g. mea- 
surable. First, {SA)'^ is equipped with the cr-field generated by cylinders, defined as 
follows. An action-cyhnder is any subset 0{h) C [SAY such that 0{h) = h{SAY 
for some h G {SA)*. A state-cyHnder is any subset 0{h) C {SAY such that 
0{h) = h{ASY for some h G [SAyS. The set of cylinders is the union of the sets 
of action-cylinders and state-cylinders. Second, [0, 1]"^ x [0, 1]" is equipped with the 
(T-field generated by products of sequence-cylinders. Checking that /^^ is measurable 
is an elementary exercise. 

Now we define two probability measures /i and jj! on [SA)^ and prove that they 
coincide. 

On one hand, the measurable mapping fs,.c ■ [0, 1]" x [0, 1]"^ (SAY defines 
naturally a probability measure fi' on {SAY- Equip the set [0, 1]'^ x [0, 1]'^ with the 
product measure v x v. Then for every measurable subset B C {SAY, 



On the other hand, the strategy a and the initial state s* naturally define another prob- 
ability measure fi on {SAY ■ According to Tulcea's theorem jj), there exists a unique 
product probability measure /i on {SAY such that /i(C'(s*)) = 1, ii{0{s)) — for 
s G S* \ {s*}, and for h — SQaiSia2 • • • s„ G {SAYS and (a, t) e A x S, 



iy{0{xo, . . .,Xn,Xn+l)) 



Xn+l ■ '^{O{xo, . . .,Xn)) ■ 




^l'{B) = {v X ^){f-UB)) . 



fi{0{ha)) = a(obs(so • • • s„))(a) • f^{0{h)) 
fi{0{hat)) ^ S{sn,a,t) ■ fi{0{ha)) . 



We have defined fs,.a in such a way that ji and ji' coincide. To prove that /i and /i' 
coincide, it is enough to prove that /i and /i' coincide on the set of cylinders, that is for 
every cyUnder 0{h) C 



f,{0{h)) = {ux v){f-],{0{h))) . (3) 

For h = s^, OT h = s ^ S \ {s*} then ^ is obvious. The general case goes by 
induction. Let h = soaiSia2 ■ ■ ■ s„ £ {SA)*S and (a, t) e A x S. Let / = [0, 1]. Let 
la = [0,a{h){a)] if a = and = [cr(/i)(a), 1] if a = 1. Let /j = [0, 5(s„, a, t)] if 
t = i(s„, a) and = [(5(s„, a, t), l]ift~ i?(s„, a). Then: 

lj{0{ha) I 0(/i)) = <T(/i)(a) 

- (i. X X /)"(/„ X /)(/ X ir) 

= X v)U-]^{0{ha)) I 
H{0{hat) I ©(/la)) = (5(s„,a,t) 

= (i. X X /)"(/ X X ir) 

which proves that ^ holds for every cylinder h. 

Now all the tools needed to prove (|2]i have been introduced, and we can state the 
main relation between fs,,a and Pr^ ((/i). Let 0' C (SA)'^ be the set of histories 
sqQiSi ■ ■ ■ such that sqSi ■ ■ ■ £ 0, and let 1^ and 1^' be the indicator functions of 
(j) and (/)'. Then: 

Jp£S^ Jp£(SA)^ Jpe{SA)-' 

^<t>'ifs..<y{x,y)) d{v X v){x,y) 

(x,a)G[04]'"x[0,l]- 




i4,'{fs,A^^y))My)] Mx) , (4) 

where the first and second equalities are by definition of Pr^^ (0), the third equality 
holds because ji = jj! , the fourth equality is a basic property of image measures, and 
the fifth equality holds by Fubini's theorem ||4l that we can use since 1^/ o /^^ ^ is 
positive. 

To complete the proof, we prove that for every x G [0, 1]", 

Pr?('/') = / l^,Us,Ax,y))dv{y) , (5) 

Equation (|4]l holds for every observation-based strategy a, hence in particular for strat- 
egy ax- But strategy ax has the following property: for every x' 6]0, and every 
y G [0, 1]", fs,,a^{x' ,y) = fs,,a{x,y)- Together with dUl, this gives (|5]l. This com- 
pletes the proof, since dU and (|5j immediately give Q. I 
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complete 
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turn-based 
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not 
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Table 2. When deterministic (e-optimal) strategies are as powerful as randomized 
strategies. The case e = in complete-observation turn-based games is open. 



Theorem 4 ( 0121 ). Let G be a CoT stochastic game with initial state and an ob- 
jective (j) C S". Then the following equalities hold: miT^^Uo ^^VaeSo ^^1'^ i't') ~ 

We obtain the following result as a consequence of Lemma[T] 

Theorem 5. Let G be a POMDP with initial state s, and an objective (p C S^. Then 
the following assertions hold: 

1- s^VaeSo Pi^s. {4>) = snp^esonsp Pr^. (<?^)- 

2. If there is a randomized optimal ( resp. almost-sure winning, positive winning) strat- 
egy for (j)from s<t, then there is a pure optimal (resp. almost-sure winning, positive 
winning) strategy for (pfrom s*. 

Theorem |4] can be derived as a consequence of Martin's proof of determinacy of 
Blackwell games |12l: the result states that for CoT stochastic games pure strategies 
can achieve the same value as randomized strategies, and as a special case the result 
also holds for MDPs. Theorem|5]shows that the result can be generalized to POMDPs, 
and a stronger result (item (2) of Theorem|5]l can be proved for POMDPs (and MDPs 
as a special case). It remains open whether result similar to item (2) of Theorem|5]can 
be proved for CoT stochastic games. The results summarizing when randomness can 
be obtained for free for strategies is shown in Table |2] 

UndecidabiUty result for POMDPs. The results of f2\ shows that the emptiness prob- 
lem for probabilistic coBiichi (resp. Biichi) automata under the almost-sure (resp. pos- 
itive) semantics 12] is undecidable. As a consequence it follows that for POMDPs the 
problem of deciding if there is a pure observation-based almost-sure (resp. positive) 
winning strategy for coBiichi (resp. Biichi) objectives is undecidable, and as a conse- 
quence of Theorem |5] we obtain the same undecidability result for randomized strate- 
gies. This result closes an open question discussed in |9|. The undecidability result 
holds even if the coBiichi (resp. Biichi) objectives are visible. 

Corollary 1. Let G be a POMDP with initial state s* and let T S be a subset of 
states (or subset of observations). Whether there exists a pure or randomized almost- 
sure winning strategy for Player 1 from s in G for the objective coBuchi(7') is unde- 
cidable; and whether there exists a pure or randomized positive winning strategy for 
Player 1 from s in G for the objective Buchi(T) is undecidable. 



Undecidability result for one-sided complete-observation turn-based games. The 

undecidability results of Corollary [T] also holds for OsT stochastic games (as they sub- 
sume POMDPs as a special case). It follows from Theorem[3]that OsT stochastic games 
can be reduced to OsT deterministic games. Thus we obtain the first undecidability re- 
sult for OsT deterministic games (the following corollary), solving the open question 

of a. 

Corollary 2. Let G be an OsT deterministic game with initial state s* and let T C S 

be a subset of states ( or subset of observations). Whether there exists a pure or random- 
ized almost-sure winning strategy for Player 1 from s in G for the objective coBuchi(7') 
is undecidable; and whether there exists a pure or randomized positive winning strategy 
for Player 1 from s in G for the objective Buchi(T) is undecidable. 

5 Conclusion 

In this work we have presented a precise characterization for classes of games where 
randomization can be obtained for free in transitions and in strategies. As a conse- 
quence of our characterization we obtain new undecidability results. The other impact 
of our characterization is as follows: for the class of games where randomization is 
free in transition, future algorithmic and complexity analysis can focus on the simpler 
class of deterministic games; and for the class of games where randomization is free in 
strategies, future analysis of such games can focus on the simpler class of deterministic 
strategies. Thus our results will be useful tools for simpler analysis techniques in the 
study of games. 
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A Appendix 



Gadget for uniform-n-ary probability reduction for Theorein|2j We now show how 
to simulate a probabiHstic state s*, with n successors sq, si, . . . , s„_i such that the 
transition probabiHty is 1/n to each of the successor, by a concurrent deterministic state. 
In the concurrent deterministic state s* there are n actions cq , oi , . . . , a„_ i available for 
player 1 and n actions bo,bi, . . . , 6„_i available for player 2. The transition function 
is as follows: for < i < n and < j < n we have S{s* , a,;, bj) — S(i+j) mod n- 
Intuitively, the transition function matrix is obtained as follows: the first row is filled 
with states sq, si, . . . , s„_i, and from a row i, the row i + 1 is obtained by moving 
the state of the first column of row i to the last column in row i + 1 and left-shifting 
by one position all the other states; the construction is illustrated on an example with 
n = 4 successors in (|6|l. The construction ensures that in every row and every column 
each state sq, si, . . . , s„_i appears exactly once. It follows that if player 1 plays all 
actions uniformly at random, then against any probability distribution of player 2 the 
successor states are sq, si, . . . , s„_i with probability 1/n each; and a similar result 
holds if player 2 plays all actions uniformly at random. The correctness of the reduction 
for uniform-n-ary probabilistic state is then exactly as the proof of Theorem|2] 

50 Si S2 S3 

51 S2 S3 So j-gx 

52 S3 So Si 

53 So Si S2 



